Neural Networks from Scratch — part 2 Chaining Neurons

Neural Networks from Scratch — part 2 Chaining Neurons

Photo by randa marzouk on Unsplash

Welcome again! This is a continuation of the previous guide. If you haven’t seen it, please go through it once. In this guide, we’ll be doing a lot of work. Our main goal is to chain neurons together, meaning the output of one neuron becomes the input of the next. Right now, we will still have a linear chain of neurons (no layers), so the network is still pretty basic. But this lays the important groundwork for the future.

Trying to model a squared relationship

We’ll be using the neuron created in the previous guide. It could successfully track linear relationships between two variables. But, what if we change that relationship?

Suppose we have two variables: xand y, where ydepends on the square of x:

x = np.array([i for i in range(1, 11)])
y = x**2 + 6

Its graph looks like this:

If we train the same neuron on this data and plot its output, we get the following:

pred = [n1.forward(i)[0] for i in x]
plt.plot(x, y)
plt.plot(x, pred)
plt.show()

It tried its best but failed. The model can only map linear relationships. If we want it to map other nonlinear relationships, we need to do something.

Chaining neurons together

Maybe it’s because there’s only one neuron. If we add more neurons, we might be able to get some complexity. Let’s try to implement that.

If we add one more neuron, our network will look like this:

It has two hidden neurons. The equations should just repeat themselves:

Notice now we have 4 parameters: the weight and bias for the first neuron and the weight and bias for the second neuron. Also, notice that the output of the first neuron z1becomes the input of the next neuron.

Updating the Backward Function

Our gradient calculation now has twice as many equations. But that’s no big deal since they’re just the same equations repeated twice. We’ll update the backward function in our Neuron class to work for both neurons.

For reference, this was the previous function:

    def backward(self, pred, y, learning_rate):
        # Getting the derivatives
        dLdz = mse_deriv(pred, y)
        dzdw = self.x
        dzdb = 1

        # These are the first and second terms of the gradient vector
        dLdw = -dLdz * dzdw
        dLdb = -dLdz * dzdb

        # Updating our weight and bias
        self.w += learning_rate * dLdw
        self.b += learning_rate * dLdb

Here’s how we’ll do it:

  1. Change Parameters: Take in an error instead of prediction and actual values.
  2. Update Weights and Biases: Use gradient descent.
  3. Propagate Error: Find the error to be sent to the previous layer.
    def backward(self, error, learning_rate):
        # Getting the derivatives
        dzdw = self.input
        dzdb = 1

        # These are the first and second terms of the gradient vector
        dLdw = -error * dzdw
        dLdb = -error * dzdb

        # Updating our weight and bias
        self.w += learning_rate * dLdw
        self.b += learning_rate * dLdb

        # The error of the previous layer
        return self.w

The code is almost the same. I’ve taken the dLdz = mse_deriv(pred, y) line out. This will be inputted as the error parameter. The only other thing I added was return self.w

Explanation of Error Propagation

If you’re confused how this happened, look at the following reasoning:

The partial of the cost with respect to the weight and bias of the second neuron is already calculated. We need to find the partial of the cost with respect to the previous neuron’s parameters.

To calculate the loss, we need to calculate z2. To calculate z2, we need z1, and to calculate z1, we need w1. So the chain rule comes out to be:

But, z1 and z2 are linearly related (Look at the formulae again). So,

This is why we return self.w. This value is returned so that we can send it to the first neuron.

Training the Network

Here’s what the training code looks like now:

n1 = Neuron()
n2 = Neuron()
learning_rate = 0.0001
epochs = 10000

for epoch in range(epochs):
    total_error = 0  # This is to calculate the average error over all the data

    # Running the training
    for i in range(len(x)):
        # Forward
        z1 = n1.forward(x[i])
        z2 = n2.forward(z1)

        # Error
        total_error += mse(z2, y[i])

        # Backward
        dLdz2 = mse_deriv(z2, y[i])
        dLdz1 *= n2.backward(dLdz2, learning_rate)
        n1.backward(change, learning_rate)

    # Getting the average error
    if epoch % 50 == 0:
        print(f"epoch = {epoch}, error = {round(total_error / len(x), 3)}")

The only change is in the Backwards part. As I’ve said, I’ve taken out the mse_deriv part outside the function. Then I run the n2.backward() function and multiply it with the error and then send it to n1.backward() . Why did this happen?

n2.backward() returns the partial of z2 with respect to z

But, the entire formula was:

So we need to multiply that term by the partial of L with respect to z2 and the partial of z1 with respect to w1. And, our error variable is precisely the first term. So, we multiply them together to get the partial of L with respect to z1. Finally, the last term will be multiplied in the first neuron, so that’s already taken care of. You can run the train function to see that the network is indeed learning. This means we got the math right.

Refactoring the code to work with any number of neurons

Now that all the math is done, we can refactor the code to work with any number of neurons. First of all, let’s create an MSE class to keep the MSE function and its derivative packaged together. This will replace the mse and mse_deriv functions we created before

class MSE:
    def forward(self, a, y): return np.power(a - y, 2).mean()

    def backward(self, a, y): return 2 * (a - y)

Next, let’s create a new class called “Network”. We will take in a list of neurons and the loss function (in this case, it is MSE)

class Network:
    def __init__(self, neurons, activations, loss_function):
        self.neurons = neurons
        self.loss_function = loss_function

Then we can add a forward function, that iteratively runs the forward command of each neuron and sends the output to the next:

class Network:
    ...
    def forward(self, x):
        for n in self.neurons:
          x = n.forward(x)
        return x
    ...

Then we add a backward function for the back-propagation part:

class Network:
    ...
    def backward(self, error, learning_rate):
        for i in range(len(self.neurons), 0, -1):
            error = self.neurons[i - 1].backward(error, learning_rate)
    ....

Finally, we can add a train function:

def train(self, training_data, training_labels, learning_rate, epochs):
    for epoch in range(epochs):
        total_error = 0

        # Running the training
        for i in range(len(training_data)):

            a = self.forward(training_data[i])

            total_error += self.loss_function.forward(a, training_labels[i])

            self.backward(self.loss_function.backward(a, training_labels[i]), learning_rate)

        if epoch % (epochs / 10) == 0:
            print(
                f"epoch = {epoch}, error = {round(total_error / len(training_data), 3)}"
            )

In the training loop, we run the forward function to get our predicted value, use the loss function to calculate our error, and then use the backward function to update the network.

Entire Network class:

class Network:
    def __init__(self, neurons, loss_function):
        self.neurons = neurons
        self.loss_function = loss_function

    def forward(self, x):
        for n in self.neurons:
            x = n.forward(x)
        return x

    def backward(self, error, learning_rate):
        for i in range(len(self.neurons), 0, -1):
            error = self.neurons[i - 1].backward(error, learning_rate)

    def train(self, training_data, training_labels, learning_rate, epochs):
        for epoch in range(epochs):
            total_error = 0

            # Running the training
            for i in range(len(training_data)):
                a = self.forward(training_data[i])
                total_error += self.loss_function.forward(a, training_labels[i])
                self.backward(
                    self.loss_function.backward(a, training_labels[i]), learning_rate
                )

            if epoch % (epochs / 10) == 0:
                print(
                    f"epoch = {epoch}, error = {round(total_error / len(training_data), 3)}"
                )

We can test out the network:

network = Network([Neuron()], MSE())

network.train(x, y, learning_rate=0.001, epochs=100)

In this case, we create a network with 1 neuron and use the MSE loss function, then we run the train function.

epoch = 0, error = 1702.332
epoch = 10, error = 122.188
epoch = 20, error = 116.771
epoch = 30, error = 111.767
epoch = 40, error = 107.177
epoch = 50, error = 102.969
epoch = 60, error = 99.109
epoch = 70, error = 95.571
epoch = 80, error = 92.327
epoch = 90, error = 89.352

Conclusion

Unfortunately, we have the same problem. You can test the code by adding 5 or 10 neurons in a chain, and the network will always produce a linear output. This shows that just adding more neurons doesn’t necessarily add complexity; we need to add something else. In the next guide, we’ll look into activation functions to fix this problem. Stay tuned!

Send a message!